Optimal design

New samples can be examined according to the goal of the investigator.

Maximum information

The differential entropy of the posterior is defined as:

$$ H(y|\mathbf{x}) = E[I(y|\mathbf{x})] = \int_X{P(y|\mathbf{x})\cdot \log(P(y|\mathbf{x})) d\mathbf{x}} $$

In order to determine which point to examine next, we want to minimize the expected entropy of the posterior distribution after examining that point. Simply picking the highest point is computationally expensive and will give us poor results so instead we sample stochastically.

Entropy can sometimes be negative, which means we need a better metric:

Relative entropy (KL divergence): KL divergence Box p. 62 is maybe the most useful metric here.

Change in entropy: difference in information old posterior vs. new posterior.

Relative entropy accounts for shifts in position, change in entropy does not. Mihay


In [25]:
import scipy.stats

print(scipy.stats.norm.entropy(0,1))
print(scipy.stats.norm.entropy(0,2))

print(scipy.stats.uniform.entropy(0,0.5))
print(scipy.stats.uniform.entropy(0,1))
print(scipy.stats.uniform.entropy(0,2))


1.4189385332046727
2.112085713764618
-0.6931471805599453
0.0
0.6931471805599453

Maximum information over a set of models

Boxoptimizes for information gain over the class of candidate models instead. Computationally simpler, less appropriate for open-ended problems.

Long run optimization

Thompson sampling.

Single-best optimization

Finite horizon vs. infinite horizon Mihay

Cost functions

Constraints